Data Mining on an OLTP System (Nearly) for Free (CMU-CS-99-151)
نویسندگان
چکیده
This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an OLTP system almost for free: there is only a small impact on the throughput and response time of the existing workload. Specifically, we show that an OLTP system has the disk resources to consistently provide one third of its sequential bandwidth to a background Data Mining task with close to zero impact on OLTP throughput and response time at high transaction loads. At low transaction loads, we show much lower impact than observed in previous work. This means that a production OLTP system can be used for Data Mining tasks without the expense of a second dedicated system. Our scheme takes advantage of close interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk head “passes over” them while satisfying demand blocks from the OLTP request stream. We show that this scheme provides a consistent level of throughput for the background workload even at very high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment that allows the background Data Mining application to also take advantage of the processing power and memory available directly on the disk drives.
منابع مشابه
Towards Higher Disk Head Utilization: Extracting "Free" Bandwidth From Busy Disk Drives (CMU-CS-00-130)
Freeblock scheduling is a new approach to utilizing more of a disk's potential media bandwidth. By lling rotational latency periods with useful media transfers, 20{50% of a never-idle disk's bandwidth can often be provided to background applications with no e ect on foreground response times. This paper describes freeblock scheduling and demonstrates its value with simulation studies of two con...
متن کاملLachesis: Robust Database Storage Management Based on Device-specific Performance Characteristics (CMU-CS-03-124)
Database systems work hard to tune I/O performance, but do not always achieve the full performance potential of modern disk systems. Their abstracted view of storage components hides useful device-specific characteristics, such as disk track boundaries and advanced built-in firmware algorithms. This paper presents a new storage manager architecture, called Lachesis, that exploits and adapts to ...
متن کاملData Warehousing, Data Mining, OLAP and OLTP Technologies Are Indispensable Elements to Support Decision-Making Process in Industrial World
This paper provides an overview of Data warehousing, Data Mining, OLAP, OLTP technologies, exploring the features, new applications and the architecture of Data Warehousing and data mining. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the online transaction processing (OLTP) applications ...
متن کاملQoS Control Based on Query Response Time Prediction
User oriented Quality of Service (QoS) of On-Line Transaction Processing (OLTP) systems (or Data Warehouse (DW)) is determined with a response time, availability, consistency and currency. In this paper we consider the influence of the system load and the system throughput on the response time, as well as a possibility of the accurate response time prediction whereby that mechanism may be a fou...
متن کاملMetadata Efficiency in a Comprehensive Versioning File System (CMU-CS-02-145)
Versioning file systems retain earlier versions of modified files, allowing recovery from user mistakes or system corruption. Unfortunately, conventional versioning systems do not efficiently record large numbers of versions. In particular, versioned metadata can consume as much space as versioned data. This paper examines two space-efficient metadata structures for versioning file systems and ...
متن کامل